sc algorithm
Interpretable label-free self-guided subspace clustering
Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be carefully tuned for the SC algorithms to achieve high clustering performance. Hyperparameter optimization (HPO) is often performed using grid-search, assuming that some labeled data is available. In some domains, such as medicine, this assumption does not hold true in many cases. One avenue of research focuses on developing SC algorithms that are inherently free of hyperparameters. For hyperparameters-dependent SC algorithms, one approach to label-independent HPO tuning is based on internal clustering quality metrics (if available), whose performance should ideally match that of external (label-dependent) clustering quality metrics. In this paper, we propose a novel approach to label-independent HPO that uses clustering quality metrics, such as accuracy (ACC) or normalized mutual information (NMI), that are computed based on pseudo-labels obtained from the SC algorithm across a predefined grid of hyperparameters. Assuming that ACC (or NMI) is a smooth function of hyperparameter values it is possible to select subintervals of hyperparameters. These subintervals are then iteratively further split into halves or thirds until a relative error criterion is satisfied. In principle, the hyperparameters of any SC algorithm can be tuned using the proposed method. We demonstrate this approach on several single- and multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets representing digits, faces and objects. The proposed method typically achieves clustering performance that is 5% to 7% lower than that of the oracle versions. We also make our proposed method interpretable by visualizing subspace bases, which are estimated from the computed clustering partitions. This aids in the initial selection of the hyperparameter search space.
- Europe > Croatia > Zagreb County > Zagreb (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Health & Medicine > Therapeutic Area (0.68)
- Health & Medicine > Diagnostic Medicine (0.67)
Subspace Clustering in Wavelet Packets Domain
Subspace clustering (SC) algorithms utilize the union of subspaces model to cluster data points according to the subspaces from which they are drawn. To better address separability of subspaces and robustness to noise we propose a wavelet packet (WP) based transform domain subspace clustering. Depending on the number of resolution levels, WP yields several representations instantiated in terms of subbands. The first approach combines original and subband data into one complementary multi-view representation. Afterward, we formulate joint representation learning as a low-rank MERA tensor network approximation problem. That is motivated by the strong representation power of the MERA network to capture complex intra/inter-view dependencies in corresponding self-representation tensor. In the second approach, we use a self-stopping computationally efficient method to select the subband with the smallest clustering error on the validation set. When existing SC algorithms are applied to the chosen subband, their performance is expected to improve. Consequently, both approaches enable the re-use of SC algorithms developed so far. Improved clustering performance is due to the dual nature of subbands as representations and filters, which is essential for noise suppression. We exemplify the proposed WP domain approach to SC on the MERA tensor network and eight other well-known linear SC algorithms using six well-known image datasets representing faces, digits, and objects. Although WP domain-based SC is a linear method, it achieved clustering performance comparable with some best deep SC algorithms and outperformed many other deep SC algorithms by a significant margin. That is in particular case for the WP MERA SC algorithm. On the COIL100 dataset, it achieves an accuracy of 87.45% and outperforms the best deep SC competitor in the amount of 14.75%.
- Europe > Croatia > Zagreb County > Zagreb (0.04)
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.92)
- Information Technology > Data Science > Data Quality > Data Transformation (0.84)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Revisiting data augmentation for subspace clustering
Abdolali, Maryam, Gillis, Nicolas
Subspace clustering is the classical problem of clustering a collection of data samples that approximately lie around several low-dimensional subspaces. The current state-of-the-art approaches for this problem are based on the self-expressive model which represents the samples as linear combination of other samples. However, these approaches require sufficiently well-spread samples for accurate representation which might not be necessarily accessible in many applications. In this paper, we shed light on this commonly neglected issue and argue that data distribution within each subspace plays a critical role in the success of self-expressive models. Our proposed solution to tackle this issue is motivated by the central role of data augmentation in the generalization power of deep neural networks. We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings that use augmented samples as an enlarged dictionary to improve the quality of the self-expressive representation. We present an automatic augmentation strategy using a few labeled samples for the semi-supervised problem relying on the fact that the data samples lie in the union of multiple linear subspaces. Experimental results confirm the effectiveness of data augmentation, as it significantly improves the performance of general self-expressive models.
- Research Report (1.00)
- Overview (0.65)
Beyond Linear Subspace Clustering: A Comparative Study of Nonlinear Manifold Clustering Algorithms
Abdolali, Maryam, Gillis, Nicolas
Subspace clustering is an important unsupervised clustering approach. It is based on the assumption that the high-dimensional data points are approximately distributed around several low-dimensional linear subspaces. The majority of the prominent subspace clustering algorithms rely on the representation of the data points as linear combinations of other data points, which is known as a self-expressive representation. To overcome the restrictive linearity assumption, numerous nonlinear approaches were proposed to extend successful subspace clustering approaches to data on a union of nonlinear manifolds. In this comparative study, we provide a comprehensive overview of nonlinear subspace clustering approaches proposed in the last decade. We introduce a new taxonomy to classify the state-of-the-art approaches into three categories, namely locality preserving, kernel based, and neural network based. The major representative algorithms within each category are extensively compared on carefully designed synthetic and real-world data sets. The detailed analysis of these approaches unfolds potential research directions and unsolved challenges in this field.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
- Europe > Belgium (0.04)
- Research Report (1.00)
- Overview (1.00)
Fused-Lasso Regularized Cholesky Factors of Large Nonstationary Covariance Matrices of Longitudinal Data
Dallakyan, Aramayis, Pourahmadi, Mohsen
Smoothness of the subdiagonals of the Cholesky factor of large covariance matrices is closely related to the degrees of nonstationarity of autoregressive models for time series and longitudinal data. Heuristically, one expects for a nearly stationary covariance matrix the entries in each subdiagonal of the Cholesky factor of its inverse to be nearly the same in the sense that sum of absolute values of successive terms is small. Statistically such smoothness is achieved by regularizing each subdiagonal using fused-type lasso penalties. We rely on the standard Cholesky factor as the new parameters within a regularized normal likelihood setup which guarantees: (1) joint convexity of the likelihood function, (2) strict convexity of the likelihood function restricted to each subdiagonal even when $n
- North America > United States > Texas > Brazos County > College Station (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.66)
Sparse and Smooth: improved guarantees for Spectral Clustering in the Dynamic Stochastic Block Model
Keriven, Nicolas, Vaiter, Samuel
In this paper, we analyse classical variants of the Spectral Clustering (SC) algorithm in the Dynamic Stochastic Block Model (DSBM). Existing results show that, in the relatively sparse case where the expected degree grows logarithmically with the number of nodes, guarantees in the static case can be extended to the dynamic case and yield improved error bounds when the DSBM is sufficiently smooth in time, that is, the communities do not change too much between two time steps. We improve over these results by drawing a new link between the sparsity and the smoothness of the DSBM: the more regular the DSBM is, the more sparse it can be, while still guaranteeing consistent recovery. In particular, a mild condition on the smoothness allows to treat the sparse case with bounded degree. We also extend these guarantees to the normalized Laplacian, and as a by-product of our analysis, we obtain to our knowledge the best spectral concentration bound available for the normalized Laplacian of matrices with independent Bernoulli entries.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > France > Bourgogne-Franche-Comté > Côte-d'Or > Dijon (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Theory of Spectral Method for Union of Subspaces-Based Random Geometry Graph
Union of Subspaces (UoS) model serves as an important model i n statistical machine learning. Briefly speaking, UoS models those high-dimensional da ta, encountered in many real-world problems, which lie close to low-dimensional subspaces corresponding to several classes to which the data belong, such as handwritten digits (Hasti e and Simard, 1998), face images (Basri and Jacobs, 2003), DNA microarray data (Parvare sh et al., 2008), and hyper-spectral images (Chen et al., 2011), to name just a few. A fund amental task in processing data points in UoS is to cluster these data points, which is kn own as Subspace Clustering (SC). Applications of SC has spanned all over science and eng ineering, including motion segmentation (Costeira and Kanade, 1998; Kanatani, 2001), face recognition (Wright et al., 2008), and classification of diseases (McWilliams and Monta na, 2014) and so on. We refer the reader to the tutorial paper (Vidal, 2011) for a review of the development of SC. The authors are with Department of Electronic Engineering, Tsinghua University, Beijing 100084, China. The corresponding author of this paper is Y. Gu (gyt@tsinghu a.edu.cn).
- Asia > China > Beijing > Beijing (0.24)
- North America > United States > Montana (0.04)
- Asia > Middle East > Jordan (0.04)
Vector Quantized Spectral Clustering applied to Soybean Whole Genome Sequences
Shastri, Aditya A., Ahuja, Kapil, Ratnaparkhe, Milind B., Shah, Aditya, Gagrani, Aishwary, Lal, Anant
We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that is a combination of Spectral Clustering (SC) and Vector Quantization (VQ) sampling for grouping Soybean genomes. The inspiration here is to use SC for its accuracy and VQ to make the algorithm computationally cheap (the complexity of SC is cubic in-terms of the input size). Although the combination of SC and VQ is not new, the novelty of our work is in developing the crucial similarity matrix in SC as well as use of k-medoids in VQ, both adapted for the Soybean genome data. We compare our approach with commonly used techniques like UPGMA (Un-weighted Pair Graph Method with Arithmetic Mean) and NJ (Neighbour Joining). Experimental results show that our approach outperforms both these techniques significantly in terms of cluster quality (up to 25% better cluster quality) and time complexity (order of magnitude faster).
- Asia > India > Madhya Pradesh (0.05)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Overview (0.68)
- Research Report > New Finding (0.34)
Large-scale subspace clustering using sketching and validation
Traganitis, Panagiotis A., Slavakis, Konstantinos, Giannakis, Georgios B.
The nowadays massive amounts of generated and communicated data present major challenges in their processing. While capable of successfully classifying nonlinearly separable objects in various settings, subspace clustering (SC) methods incur prohibitively high computational complexity when processing large-scale data. Inspired by the random sampling and consensus (RANSAC) approach to robust regression, the present paper introduces a randomized scheme for SC, termed sketching and validation (SkeVa-)SC, tailored for large-scale data. At the heart of SkeVa-SC lies a randomized scheme for approximating the underlying probability density function of the observed data by kernel smoothing arguments. Sparsity in data representations is also exploited to reduce the computational burden of SC, while achieving high clustering accuracy. Performance analysis as well as extensive numerical tests on synthetic and real data corroborate the potential of SkeVa-SC and its competitive performance relative to state-of-the-art scalable SC approaches. Keywords: Subspace clustering, big data, kernel smoothing, randomization, sketching, validation, sparsity.